Regret Testing: A Simple Payoff-Based Procedure for Learning Nash Equilibrium∗
نویسنده
چکیده
A learning rule is uncoupled if a player does not condition his strategy on the opponent’s payoffs. It is radically uncoupled if the player does not condition his strategy on the opponent’s actions or payoffs. We demonstrate a simple class of radically uncoupled learning rules, patterned after aspiration learning models, whose period-byperiod behavior comes arbitrarily close to Nash equilibrium behavior in any finite two-person game. 1 Payoff-based learning rules In this paper we propose a class of simple, adaptive learning rules that depend only on players’ realized payoffs, such that when two players employ a rule from this class their period-by-period strategic behavior approximates Nash equilibrium behavior. Like reinforcement and aspiration models, this type of rule depends only on summary statistics that are derived from the players’ received payoffs; indeed the players do not even need to know they are involved in a game for them to learn equilibrium eventually. To position our contribution with respect to the recent literature, we need to consider three separate issues: i) the amount of information needed to implement a learning rule; ii) the type of equilibrium to which the learning process tends (Nash, correlated, etc.); iii) the sense in which the process can be said to “approximate” the type of equilibrium behavior in question. (For a further discussion of these issues see Young, 2004) Consider, for example, the recently discovered regret matching rules of Hart and Mas-Colell (2000, 2001). The essential idea is that players randomize among actions in proportion to their regrets from not having played those actions in the past. Like the regret-testing rules we introduce here, See, for example, Bush and Mosteler, 1955, Erev and Roth, 1998; Karandikar, Mookherjee, Ray, and Vega-Redondo, 1998; Börgers and Sarin, 2000; and Bendor, Mookherjee, and Ray, 2001.
منابع مشابه
Regret Testing: A Simple Payo¤-Based Procedure for Learning Nash Equilibrium1
A learning rule is uncoupled if a player does not condition his strategy on the opponents payo¤s. It is radically uncoupled if a player does not condition his strategy on the opponents actions or payo¤s. We demonstrate a family of simple, radically uncoupled learning rules whose period-by-period behavior comes arbitrarily close to Nash equilibrium behavior in any nite two-person game. Keywor...
متن کاملLearning by trial and error
A person learns by trial and error if he occasionally tries out new strategies, rejecting choices that are erroneous in the sense that they do not lead to higher payoffs. In a game, however, strategies can become erroneous due to a change of behavior by someone else. Such passive errors may also trigger a search for new and better strategies, but the nature of the search is different than when ...
متن کاملRegret testing: learning to play Nash equilibrium without knowing you have an opponent
A learning rule is uncoupled if a player does not condition his strategy on the opponent’s payoffs. It is radically uncoupled if a player does not condition his strategy on the opponent’s actions or payoffs. We demonstrate a family of simple, radically uncoupled learning rules whose period-by-period behavior comes arbitrarily close to Nash equilibrium behavior in any finite two-person game.
متن کاملGlobal Nash convergence of Foster and Young's regret testing
We construct an uncoupled randomized strategy of repeated play such that, if every player plays according to it, mixed action profiles converge almost surely to a Nash equilibrium of the stage game. The strategy requires very little in terms of information about the game, as players’ actions are based only on their own past payoffs. Moreover, in a variant of the procedure, players need not know...
متن کاملUnifying Convergence and No-Regret in Multiagent Learning
We present a new multiagent learning algorithm, RVσ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004